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This paper compares the fuzzy kernel k-medoids using radial basis function 
(RBF) and polynomial kernel function in hepatitis classification. These two 
kernel functions were chosen due to their popularity in any kernel-based 
machine learning method for solving the classification task. The hepatitis 
dataset then used to evaluate the performance of both methods that were 
expected to provide an accurate diagnosis in patients to obtain treatment at an 
early phase. The data were obtained from two hospitals in Indonesia, 
consisting of 89 hepatitis-B and 31 hepatitis-C samples. The data were 
analyzed using several cases of k-fold cross-validation, and the performances 
were compared according to their accuracy, sensitivity, precision, F1-Score, 
and running time. From the experiments, it was concluded that fuzzy kernel 
k-medoids using RBF kernel function is better compared to polynomial 
kernel function with the 6% increment of accuracy, 13% enhancement of 
sensitivity, and 5% improvement in Fl-Score. On the other side, the 
precision of fuzzy kernel k-medoids using polynomial kernel function is 2% 


higher than using the RBF kernel function. According to the results, the use 
of RBF or polynomial kernel function in fuzzy kernel medoids can be 
considered according to the primary goal of the classification. 
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1. INTRODUCTION 

Hepatitis is a severe health problem and one of the leading causes of death across the globe. 
According to the global hepatitis report 2017 [1], approximately 257 million people were living with hepatitis 
B and 71 million with hepatitis C in 2015. However, in Indonesia, the prevalence of clinical hepatitis was 
estimated at 0.6% in 2007 [2]. These kinds of viral hepatitis tend to become chronic, thereby causing more 
deaths. Therefore, the prevention of viral hepatitis, as stated by Hou ef al. [3], consists of behavior 
modification, passive immunoprophylaxis, and active immunization. Earlier prevention of viral hepatitis is 
also estimated using various machine learning techniques, which were expected to help patients take 
treatment in the earlier phase of the virus, thereby stopping it from being amplified [4]. 

Some researchers have published the use of machine learning in hepatitis classification [4-7]. In this 
paper, fuzzy kernel k-medoids is used to develop hepatitis classification to make it more accurate in 
providing a diagnosis. The kernel technique that was introduced by Vapnik [8] and later developed by 
Scholkopf ef al. [9], and Christianini [10] will be used in fuzzy kernel k-medoids to overcome the 
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possibilities of not separable linearly data set. Fuzzy kernel k-medoids have been previously used in 
problems related to anomaly detection [11] and multiple data detection such as breast cancer Wisconsin, 
diabetes, image segmentation, iris, and much more [12]. Furthermore, the machine learning method based on 
the kernel has previously been used in diagnosing several diseases and deliver excellent accuracy [13-17]. 
The kernel function is useful to avoid misclassifying the dataset with a spherical shape which is only solved 
by a linear function. 


2. RESEARCH METHOD 
2.1. Material 

The hepatitis dataset, which was also used by Kurniawan and Rustam [18], was obtained from 
Tangerang and Mitra Keluarga Kelapa Gading Hospitals, consisting of 89 hepatitis B and 31 hepatitis C 
samples. Each sample is described by features such as gender, serum glutamic oxaloacetic transaminase 
(SGOT), serum glutamic pyruvic transaminase (SGPT), anti-HCV, HBsAg, urea, and creatinine. All of these 
features will be used in the process of classification. 


2.2. Method 
2.2.1. Fuzzy kernel k-medoids 

This method is a combination of three concepts [11]. These are fuzzy k-Medoids, proposed by 
Krishnapuram et al. [19], Kernel function, which was introduced by Vapnik ef al. [8], and fuzziness degree 
[20]. Given a dataset X = {x1,Xp,...,X,} where x; € R¢ for i=1,2,...,n. The objective function of this 
method is given in (1) where u,;; denotes the membership value of the sample x; in the cluster j. 


JU,V) = Sy V9.1 wjulK (xi v;) (1) 


The membership value u;; is updated using the formula in (2) and the medoid v; is calculated as the 
formula in (3). 


1 


(k(xiv,;)) ™* 


ig =e ASSIS ypSe (2) 
Dhar(K (x40) m-1 


— —_ ; m 
V; = Xp where p = arg min uj} K (Xq Xi) (3) 


The algorithm of fuzzy kernel k-medoids [11] is given in Figure 1. 


Input: X = {%4,%2,..,X%,},c,m;,my,é, T (the maximum number of iterations allowed) . 
Output: V = {v,,V2,.. v,}, U= [u;;], where 1<i<nl1l<j<c. 
1. Initialization: V° = {v,,0,,... vg} 
t 
2. m =m; +7 (m — mj) 
3. Update membership of the data point x; in j@-cluster using (2). 


4. Update medoids v; using (3). 


2 
5. If E = Yiu. (k(v?. vf) <e or T=t, then the iteration stops. Otherwise, t=t+ 


1 and go back to step 2; 
End. 


Figure 1. Algorithm of fuzzy kernel k-medoids 


This method utilized the RBF and polynomial kernel function. The RBF kernel mostly used because 
of its simplicity that has fewer hyperparameters. The number of hyperparameters used in the kernel usually 
influences the complexity of model selection [21]. Meanwhile, polynomial was also one of the kernel 
functions that commonly used mainly for the lower polynomial degree, because the infinite degree of a 
polynomial has the same form with the gaussian RBF kernel [22] the polynomial kernel has more 
hyperparameters than the RBF kermel. The formulas [23] are shown in (4-5), respectively. 
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2 
RBF kernel function: K(x; v,) = exp (- a (4) 
Polynomial kernel function: K(x;,v;) = (x; “vet 1)" (5) 


2.2.2. Research methodology 

The k-fold cross-validation [24] will be used in this paper for evaluating the fuzzy kernel k-medoids 
algorithm. For example, when we used 3-fold cross-validation, the data is divided into three folds for each 
class. Therefore, we get the number of points in every fold, as shown in Table 1. 


Table 1. The number of samples in every three folds of hepatitis dataset 
Fold The number of hepatitis B samples __ The number of hepatitis C samples 


1 30 10 
2 30 10 
3 29 11 
Total 89 31 


The k-fold cross-validation for classification tasks using fuzzy kernel k-medoids might be 
unfamiliar due to its utilization that commonly used for clustering or unsupervised learning [25] methods in 
machine learning. In this fuzzy kernel k-medoids, a fold was used to obtain the centroids of the clusters 
according to the algorithm in Figure 1. In contrast, the rest k-1 folds were used to evaluate the method by 
determining the class of every data point according to its nearest centroid. Consider the data labeled hepatitis 
B belongs to class 1 and the data labeled hepatitis C belongs to class 2. If the data point was nearer to the 
centroid of class 1, then the predicted class for this data point is hepatitis B. Meanwhile, if the data point was 
nearer to the centroid of class 2, then the predicted class for this data point is hepatitis C. 


2.2.3. Performance measure 

Accuracy, sensitivity, precision, and Fl-Score were used as performance measurement. It was 
calculated using the (6-9) while considering the results of the confusion matrix. TP is the number of hepatitis- 
B samples correctly diagnosed and TN is the number of hepatitis-C samples correctly diagnosed. Meanwhile, 
FN is the number of hepatitis-B samples incorrectly diagnosed and FP is the number of hepatitis-C samples 
incorrectly diagnosed. 


TP+TN 
Accuracy=———_——_—_ 
Y~TP+TN+FN+EP (6) 
sects TP 
ensitivity=——— 7 
s y~TP+EN 7) 
aot TP 
Precision= 
TP+FP (8) 
2 * sensitivity * precision 
F1-Score= (9) 


sensitivity + precision 


3. RESULTS AND ANALYSIS 

The performance of fuzzy kernel k-medoids is evaluated using k-fold cross-validation in which k = 
3,5, 7,10. However, this research makes use of RBF and polynomial kernel function with several kernel 
parameters and polynomial degrees examined. The performance of fuzzy kernel k-medoids using RBF kernel 
function is shown in Table 2. 

According to Table 2, the kernel parameter o = 0.0001 performs excellently in every performance 
measurement of each cross-validation. However, the highest value of accuracy, sensitivity, precision, and F1- 
Score of this kernel parameter are obtained when 7-fold cross-validation is used. The performance of fuzzy 
kernel k-medoids using polynomial kernel function is shown in Table 3. 
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Table 2. The performance of fuzzy kernel k-medoids using RBF kernel function 


Kernel parameter of RBF 


Evaluation method Performance measure 


0.0001 0.001 _0.05 0.1 1 5 10 50 100 1000 

3-fold CV Accuracy 78.89 77.78 77.78 77.50 74.67 72.96 73.49 72.08 71.11 70.44 
Sensitivity 98.61 97.22 96.30 95.83 90.28 86.81 87.30 84.90 83.02 82.22 

Precision 79.78 79.55 80.00 80.00 80.45 80.82 81.03 81.09 81.27 81.10 

Fl-Score 88.20 87.50 87.39 87.20 85.08 83.71 84.05 82.95 82.14 81.66 

Running Time 1.27 1.03 1.13 1.13 1.50 1.13 1.11 1.08 1.06 1.08 

5-fold CV Accuracy 77.78 7611 75.56 75.00 74.00 73.15 72.70 72.22 71.85 71.56 
Sensitivity 100.00 97.86 96.19 95.00 92.86 91.43 90.41 89.46 88.73 88.29 

Precision 77.78 7740 77.69 77.78 77.94 77.89 77.99 78.04 78.07 78.03 

Fl-Score 87.50 86.44 85.96 85.53 84.75 84.12 83.74 83.36 83.06 82.84 

Running Time 0.06 1.44 1.81 1.42 0.69 0.58 0.56 0.56 0.58 0.58 

7-fold CV Accuracy 82.14 80.95 80.56 80.65 79.76 79.37 79.08 78.87 78.70 78.57 
Sensitivity 98.57 97.14 96.19 95.71 94.29 93.57 93.06 92.68 92.38 92.14 

Precision 83.13 82.93 83.13 83.49 83.54 83.62 83.67 83.71 83.74 83.77 

Fl-Score 90.20 89.47 89.18 89.18 88.59 88.31 88.12 87.97 87.85 87.76 

Running Time 2.08 1.67 1.61 1.48 0.41 0.41 0.39 0.39 0.39 0.42 

10-fold CV Accuracy 77.78 75.56 74.81 7444 72.67) 72.22 71.90 71.53 71.23 71.00 
Sensitivity 100.00 97.14 95.24 94.29 91.71 90.71 90.00 89.29 88.73 88.29 

Precision 77.78 77.27) 77.52 77.65 77.35 77.44 77.50 77.52. 77.53 77.54 

Fl-Score 87.50 86.08 85.47 85.16 83.92 83.55 83.29 82.99 82.75 82.57 

Running Time 0.08 1.50 1.63 1.58 1.14 0.97 0.91 0.91 0.92 0.88 


Table 3. The performance of fuzzy kernel k-medoids using polynomial kernel function 
Polynomial degree 


Evaluation method Performance measure 


1 2 3 4 5 6 7 8 9 10 

3-fold CV Accuracy 70.00 71.11 71.48 69.44 70.67 71.67 72.54 73.06 73.58 73.89 
Sensitivity 79.17 80.56 81.02 77.78 80.28 81.94 83.13 83.85 84.57 85.00 

Precision 82.61 82.86 82.94 82.96 82.57 82.52 82.64 82.71 82.78 82.81 

Fl-Score 80.85 81.69 81.97 80.29 81.41 82.23 82.89 83.28 83.66 83.89 

Running Time 1.16 1.19 1.42 1.30 1.28 1.58 1.31 1.31 1.33 1.36 

5-fold CV Accuracy 68.89 68.89 69.26 69.17 69.56 70.56 71.27) 71.81 72.22 72.56 
Sensitivity 82.86 82.86 83.33 83.21 8343 8452 85.31 85.89 86.35 86.71 

Precision 78.38 78.38 78.48 78.45 78.71 79.06 79.32 79.50 79.65 79.76 

Fl-Score 80.56 80.56 80.83 80.76 81.00 81.70 82.20 82.58 82.86 83.09 

Running Time 0.58 0.45 1.39 0.34 1.08 1.00 1.05 1.05 1.06 1.13 

7-fold CV Accuracy 77.38 78.57 78.17 78.87 78.81 78.97 78.91 78.87 78.70 78.69 
Sensitivity 90.00 92.14 91.90 91.79 91.14 90.95 90.61 90.36 90.00 89.86 

Precision 84.00 83.77 83.55 84.26 84.62 8489 85.06 85.19 85.26 85.35 

Fl-Score 86.90 87.76 87.53 87.86 87.76 87.82 87.75 87.69 87.57 87.54 

Running Time 0.44 1.64 0.95 1.02 0.95 0.98 1.02 1.05 1.11 1.22 

10-fold CV Accuracy 68.89 68.33 68.52 69.44 7044 7111 71.75 72.36 72.59 72.78 
Sensitivity 84.29 82.86 82.38 83.21 84.29 85.00 85.51 86.07 86.35 86.57 

Precision 77.63 77.85 78.28 78.72 79.09 79.33 79.66 79.93 80.00 80.05 

Fl-Score 80.82 80.28 80.28 80.90 81.60 82.07 82.48 82.89 83.05 83.18 

Running Time 1.30 0.84 1.11 1.14 1.22 1.08 1.58 1.66 1372 1.66 


Table 3 shows that the tenth polynomial degree almost achieves the best performance in every 
cross-validation. The results are more complicated in the 7-fold cross-validation because the highest value of 
every performance measure is obtained in a different polynomial degree. However, further analysis shows the 
fourth polynomial degree as the best performance following the values and the measurements. Therefore, 
fuzzy kernel k-medoids using RBF kernel function of o=0.0001 and fourth polynomial kernel function are 
compared, as shown in Figure 2. If we analyze Tables 2-3 further in comparing each of its highest value, we 
can conclude that fuzzy kernel k-medoids using RBF kernel function is better compared to polynomial kernel 
function with the 6% increment of accuracy, 13% enhancement of sensitivity, and 5% improvement in F1- 
Score. On the other side, the precision of fuzzy kernel k-medoids using polynomial kernel function is 2% 
higher than using the RBF kernel function. Based on this figure, it is concluded that fuzzy kernel k-medoids 
performs better when using RBF than polynomial kernel function. The comparison shows that RBF makes 
fuzzy kernel k-medoids performance to become more excellent in accuracy, sensitivity, and Fl-Score. On the 
other side, the polynomial degree makes fuzzy kernel k-medoids better in precision. The RBF kernel function 
performs better in these three measurements and in running time. As shown in Table 4, the fuzzy kernel k- 
medoids using RBF kernel function is faster in running time than the polynomial kernel function used in 
every evaluation method. 
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Figure 2. Comparison of fuzzy kernel k-medoids using RBF kernel function using o=0.0001 and using the 
fourth polynomial kernel function 


Table 4. Comparison of the best kernel function in every evaluation method 


Evaluation method Kernel function Running time 
3-fold CV RBF kernel with o=0.001 1.03 
1st polynomial kernel 1.16 
5-fold CV RBF with o=0.0001 0.06 
4th polynomial kernel 0.34 
7-fold CV RBF with o=10, 50, 100 0.39 
1st polynomial kernel 0.44 
10-fold CV RBF with o=0.0001 0.08 
2nd polynomial kernel 0.84 


4. CONCLUSION 

Early detection of hepatitis is expected to help patients to obtain proper treatment, considering this 
disease as one of the crucial causes of death worldwide. There are several types of hepatitis; however, most 
found cases are hepatitis B and hepatitis C. Therefore, this paper proposed the use of the fuzzy kernel k- 
medoids using RBF and polynomial kernel function for the hepatitis classification. Data were obtained from 
two hospitals in Indonesia, consisting of 89 hepatitis-B and 31 hepatitis-C samples. According to the 
experiments, it is concluded that RBF using o=0.0001 delivers better performance than the fourth polynomial 
kernel function in the fuzzy kernel k-medoids. Furthermore, the comparison shows that the RBF kernel 
makes fuzzy kernel k-medoids performance improve in accuracy, sensitivity, and Fl-Score. On the other 
side, the polynomial degree makes fuzzy kernel k-medoids better in precision. Even though the proposed 
method in this paper already delivered excellent performance, the other methods with some technique to 
obtaining balance data can be used as future work to obtain a better, more accurate, and precise diagnosis. 
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